Preparations

Load the necessary libraries

library(car)       #for regression diagnostics
library(broom)     #for tidy output
library(ggfortify) #for model diagnostics
library(sjPlot)    #for outputs
library(knitr)     #for kable
library(effects)   #for partial effects plots
library(emmeans)   #for estimating marginal means
library(ggeffects) #for plotting marginal means
library(MASS)      #for glm.nb
library(MuMIn)     #for AICc
library(tidyverse) #for data wrangling
library(modelr)    #for auxillary modelling functions
library(DHARMa)    #for residual diagnostics plots
library(performance) #for residuals diagnostics
library(see)         #for plotting residuals
library(patchwork) #grid of plots
library(scales)    #for more scales

Scenario

An ecologist studying a rocky shore at Phillip Island, in southeastern Australia, was interested in how clumps of intertidal mussels are maintained (Quinn 1988). In particular, he wanted to know how densities of adult mussels affected recruitment of young individuals from the plankton. As with most marine invertebrates, recruitment is highly patchy in time, so he expected to find seasonal variation, and the interaction between season and density - whether effects of adult mussel density vary across seasons - was the aspect of most interest.

The data were collected from four seasons, and with two densities of adult mussels. The experiment consisted of clumps of adult mussels attached to the rocks. These clumps were then brought back to the laboratory, and the number of baby mussels recorded. There were 3-6 replicate clumps for each density and season combination.

Format of quinn.csv data files

SEASON DENSITY RECRUITS SQRTRECRUITS GROUP
Spring Low 15 3.87 SpringLow
.. .. .. .. ..
Spring High 11 3.32 SpringHigh
.. .. .. .. ..
Summer Low 21 4.58 SummerLow
.. .. .. .. ..
Summer High 34 5.83 SummerHigh
.. .. .. .. ..
Autumn Low 14 3.74 AutumnLow
.. .. .. .. ..
SEASON Categorical listing of Season in which mussel clumps were collected ­ independent variable
DENSITY Categorical listing of the density of mussels within mussel clump ­ independent variable
RECRUITS The number of mussel recruits ­ response variable
SQRTRECRUITS Square root transformation of RECRUITS - needed to meet the test assumptions
GROUPS Categorical listing of Season/Density combinations - used for checking ANOVA assumptions

Mussel

Read in the data

quinn = read_csv('../public/data/quinn.csv', trim_ws=TRUE)
glimpse(quinn)
## Rows: 42
## Columns: 5
## $ SEASON       <chr> "Spring", "Spring", "Spring", "Spring", "Spring", "Sprin…
## $ DENSITY      <chr> "Low", "Low", "Low", "Low", "Low", "High", "High", "High…
## $ RECRUITS     <dbl> 15, 10, 13, 13, 5, 11, 10, 15, 10, 13, 1, 21, 31, 21, 18…
## $ SQRTRECRUITS <dbl> 3.872983, 3.162278, 3.605551, 3.605551, 2.236068, 3.3166…
## $ GROUP        <chr> "SpringLow", "SpringLow", "SpringLow", "SpringLow", "Spr…
summary(quinn)
##     SEASON            DENSITY             RECRUITS      SQRTRECRUITS  
##  Length:42          Length:42          Min.   : 0.00   Min.   :0.000  
##  Class :character   Class :character   1st Qu.: 9.25   1st Qu.:3.041  
##  Mode  :character   Mode  :character   Median :13.50   Median :3.674  
##                                        Mean   :18.33   Mean   :3.871  
##                                        3rd Qu.:21.75   3rd Qu.:4.663  
##                                        Max.   :69.00   Max.   :8.307  
##     GROUP          
##  Length:42         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Since we intend to model both SEASON and DENSITY as categorical variables, we need to explicitly declair them as factors.

quinn = quinn %>% mutate(SEASON = factor(SEASON, levels=c('Spring', 'Summer', 'Autumn', 'Winter')),
                         DENSITY = factor(DENSITY))

Exploratory data analysis

Model formula: \[ \begin{align} y_i &\sim{} \mathcal{Pois}(\lambda_i)\\ ln(\mu_i) &= \boldsymbol{\beta} \bf{X_i}\\[1em] \end{align} \]

where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the intercept and effects of season, density and their interaction on mussel recruitment.

ggplot(quinn, aes(y=RECRUITS, x=SEASON, fill=DENSITY)) +
     geom_boxplot()

Conclusions:

  • there is clear evidence of non-homogeneity of variance
  • specifically, there is evidence that the variance is related to the mean in that boxplots that are lower on the y-axis (low mean) also have lower variance (shorter boxplots)
  • this might be expected for count data and we might consider that a Poisson distribution (which assumes that mean and variance are equal - and thus related in a very specific way).

Lets mimic the effect of using a log link, by using log scaled y-axis.

ggplot(quinn, aes(y=RECRUITS, x=SEASON, fill=DENSITY)) +
  geom_boxplot() +
  scale_y_log10()

Conclusions:

  • that is an improvement

Fit the model

Model validation

Different model

Parial plots

Model investigation / hypothesis testing

Predictions

Summary figures

library(pscl)
quinn.zip <- zeroinfl(RECRUITS ~ DENSITY*SEASON | 1, data=quinn,  dist='poisson')
#quinn.resid <- simulateResiduals(quinn.zip,  plot=TRUE)
summary(quinn.zip)
## 
## Call:
## zeroinfl(formula = RECRUITS ~ DENSITY * SEASON | 1, data = quinn, dist = "poisson")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4510 -0.8645  0.1262  0.7225  2.8711 
## 
## Count model coefficients (poisson with log link):
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               2.3026     0.1291  17.836  < 2e-16 ***
## DENSITYLow                0.1133     0.1858   0.610  0.54191    
## SEASONSummer              1.5721     0.1419  11.081  < 2e-16 ***
## SEASONAutumn              0.6763     0.1586   4.266 1.99e-05 ***
## SEASONWinter             -0.5680     0.2147  -2.646  0.00815 ** 
## DENSITYLow:SEASONSummer  -0.8970     0.2134  -4.202 2.64e-05 ***
## DENSITYLow:SEASONAutumn  -0.1881     0.2381  -0.790  0.42957    
## DENSITYLow:SEASONWinter   0.2165     0.4534   0.478  0.63300    
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.0037     0.7305  -4.112 3.92e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 13 
## Log-likelihood: -151.3 on 9 Df
#tidy(quinn.zip,  conf.int=TRUE, exponentiate = TRUE)
exp(-3.0037)
## [1] 0.0496032
quinn.zip1 <- zeroinfl(RECRUITS ~ DENSITY*SEASON | SEASON, data=quinn,  dist='poisson')
                                        #quinn.resid <- simulateResiduals(quinn.zip)
summary(quinn.zip1)
## 
## Call:
## zeroinfl(formula = RECRUITS ~ DENSITY * SEASON | SEASON, data = quinn, 
##     dist = "poisson")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5327 -1.0591  0.1537  0.9216  3.6831 
## 
## Count model coefficients (poisson with log link):
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               2.3026     0.1291  17.836  < 2e-16 ***
## DENSITYLow                0.1133     0.1858   0.610  0.54191    
## SEASONSummer              1.5721     0.1419  11.081  < 2e-16 ***
## SEASONAutumn              0.6763     0.1586   4.266 1.99e-05 ***
## SEASONWinter             -0.5680     0.2147  -2.646  0.00815 ** 
## DENSITYLow:SEASONSummer  -0.8970     0.2134  -4.202 2.64e-05 ***
## DENSITYLow:SEASONAutumn  -0.1881     0.2381  -0.790  0.42958    
## DENSITYLow:SEASONWinter   0.2291     0.4375   0.524  0.60045    
## 
## Zero-inflation model coefficients (binomial with logit link):
##                Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -2.157e+01  1.454e+04  -0.001    0.999
## SEASONSummer -2.543e-08  2.013e+04   0.000    1.000
## SEASONAutumn -2.119e-08  2.107e+04   0.000    1.000
## SEASONWinter  2.031e+01  1.454e+04   0.001    0.999
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 13 
## Log-likelihood:  -148 on 12 Df
exp(-3.0037)
## [1] 0.0496032
quinn.zinb <- zeroinfl(RECRUITS ~ DENSITY*SEASON | 1, data=quinn,  dist='negbin')
AICc(quinn.zip,  quinn.zinb)
summary(quinn.zinb)
## 
## Call:
## zeroinfl(formula = RECRUITS ~ DENSITY * SEASON | 1, data = quinn, dist = "negbin")
## 
## Pearson residuals:
##        Min         1Q     Median         3Q        Max 
## -1.981e+00 -7.511e-01 -1.522e-05  5.289e-01  2.869e+00 
## 
## Count model coefficients (negbin with log link):
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               2.3026     0.1875  12.283  < 2e-16 ***
## DENSITYLow                0.1133     0.2742   0.413  0.67933    
## SEASONSummer              1.5721     0.2389   6.580 4.69e-11 ***
## SEASONAutumn              0.6763     0.2492   2.714  0.00664 ** 
## SEASONWinter             -0.5680     0.2881  -1.971  0.04869 *  
## DENSITYLow:SEASONSummer  -0.8969     0.3509  -2.556  0.01059 *  
## DENSITYLow:SEASONAutumn  -0.1881     0.3788  -0.496  0.61957    
## DENSITYLow:SEASONWinter  -0.8671     0.5338  -1.624  0.10432    
## Log(theta)                2.1997     0.4091   5.378 7.55e-08 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -15.29     453.38  -0.034    0.973
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 9.0223 
## Number of iterations in BFGS optimization: 43 
## Log-likelihood: -137.5 on 10 Df
exp(-15.29)
## [1] 2.288956e-07

References

Quinn, G. P. 1988. “Ecology of the Intertidal Pulmonate Limpet Siphonaria Diemenensis Quoy et Gaimard. II Reproductive Patterns and Energetics.” Journalofexperimentalmarinebiologyandecology 117: 137–56.